Identification and genetic diversity analysis of high-yielding charcoal rot resistant soybean genotypes

Charcoal rot disease caused by Macrophomina phaseolina (Tassi) Goid is one of the most devastating diseases in soybean in India. During 2018, 226 diverse soybean genotypes were evaluated for genetic resistance under hot-spot conditions. Out of them, a subset of 151 genotypes were selected based on Percent Disease Incidence (PDI) and better agronomic performance. Out of these 151 genotypes evaluated during 2019, 43 genotypes were selected based on PDI and superior agronomic performance for further field evaluation and molecular characterization. During 2020 and 2021, these forty-three genotypes, were evaluated for PDI, Area Under Disease Progress Curve (AUDPC), and grain yield. In 2020, genotype JS 20-20 showed least PDI (0.42) and AUDPC (9.37).Highest grain yield was recorded by the genotype JS 21-05 (515.00 g). In 2021, genotype JS 20-20 exhibited least PDI (0.00) and AUDPC (0.00).Highest grain yield was recorded in JS 20-98 (631.66 g). Across both years, JS 20-20 had the least PDI (0.21) and AUDPC (4.68), while grain yield was highest in JS 20-98 (571.67 g). Through MGIDI (multi-trait genotype-ideotype distance) analysis, JS 21-05 (G19), JS 22-01 (G43), JS 20-98 (G28) and JS 20-20 (G21) were identified as the ideotypes with respect to the traits that were evaluated. Two unique alleles, Satt588 (100 bp) on linkage group K (Chromosome no 9) and Sat_218 (200 bp) on linkage group H (Chromosome no 12), were specific for thetwo resistant genotypes JS 21-71and DS 1318, respectively. Through cluster analysis, it was observed that the genotypes bred at Jabalpur were more genetically related.


Materials and methods
Preliminary screening for charcoal rot resistance. During 2018, a total of 226 soybean genotypes including varieties, breeding lines and exotic accessions were evaluated for charcoal rot resistance under hot spot conditions at J.N.K.V.V, Jabalpur, India. The experimental design used was an augmented block design containing seven blocks. Out of 226 genotypes, subsets of 151 genotypes were selected based on disease reaction and better agronomic performance. This subsets was evaluated in 2019 using an augmented block design containing six blocks.
During both the years, genotypes were sown in two rows three meters long. Four checks-JS 20-29, JS 335, JS 93-05 and JS 95-60 were repeated and randomized across the blocks. Disease evaluation was done in terms of PDI at R 7 (physiological maturity) growth stage 34 , using a disease rating scale 0 to 9 35 (Table 1). Disease reaction on susceptible checks ranged from susceptible to highly susceptible during both years indicating high-disease pressure at the experimental field site.
Selective screening for charcoal rot resistance. Out of 151 genotypes evaluated during 2019, 43 genotypes were selected based on disease reaction and superior agronomic performance for their further field evaluation and molecular characterization. During 2020 and 2021, these 43 genotypes, along with five checks-JS 20-29, JS 335, JS 93-05, JS 95-60 and Dsb 21 were evaluated for PDI, AUDPC, and grain yield per plot (3.0 × 0.6 m 2 ). The experiments were conducted in a RCBD design with three replications. In order to ensure high-disease pressure and no disease escape, seeds were mixed sorghum grain infected with M. phaseolina (10 g/each genotype/each replication) before sowing. Prior to mass multiplication of the pathogen, pathogenicity of the isolate (Fig. S1) was confirmed through a cut-stem inoculation technique 36 . Percent Disease Index was recorded during pod development, seed filling and at physiological maturity (between R 6 and R 7 growth stage) by recording the number of dead plants in each plot. To evaluate the genotypes based on AUDPC, progressive development of disease was recorded at reproductive stages of soybean at 45, 60, 75 and 90 days after sowing.
AUDPC was calculated as per 37 www.nature.com/scientificreports/ where, y i = per cent incidence of charcoal rot at ith observation, t i = time (days) at ith observation, and n = number of observation Diversity analysis of soybean genotypes. Diversity analysis of48 soybean genotypes under study were performed using SSR markers developed by 38 (table). Plant genomic DNA was extracted using CTAB method (Cetyltrimethylammonium bromide) 39 . The purified plant genomic DNA was quantified using the nanodrop (Denovix DS-11 + spectrophotometer) and the quality of the DNA was checked on 0.8% agarose gel electrophoresis. Polymorphism among the genotypes was determined using 59 SSR markers distributed across the 20 soybean linkage groups (https:// soyba se. org/). For marker analysis, the purified genomic DNA was subjected to amplification using PCR in reaction mixture (10 µl) containing 1.0 µl DNA (50-70 ng/µl), 1 µl 10 × PCR master mix, 0.6 µl each forward and reverse SSR primers (100 ng/µl), and 6.8 µl molecular-grade water. Amplification using SSR markers was carried out in thermocycler (Applied Biosystems, USA) using the standard protocol conditions with initial denaturation at 94 °C for 5 min, denaturation (94 °C) for 40 s, annealing (55 °C) for 1 min, extension (72 °C) for 1 min and final extension (72 °C) for 7 min. Amplified SSR products were resolved on 3.5% Metaphor agarose (Lonza, Switzerland) and SSR sizes were estimated using 50 bp DNA ladder. All the polymorphic markers in each genotype were recorded for the number of alleles present in the particular marker. Bands were scored as 1 (presence) or 0 (absence) for each allele and missing bands were scored as 9. Polymorphic information content (PIC) and expected heterozygosity (H) values show the discriminating ability of the marker based on the number of known alleles and their frequency distribution. PIC value for each marker was analyzed using the formula given by 40 .
where, Pi indicates the frequency of the ith allele among the genotypes analyzed and was calculated for each SSR locus. Jaccard's similarity coefficient was employed in estimating the genetic similarity among genotypes. The resulting similarity matrix was further analyzed using the unweighted pair-group method arithmetic average (UPGMA) clustering algorithm for construction of dendrogram.
Statistical analyses. Prior to analysis, data for Percent Disease Incidence (PDI) was transformed using arcsine transformation 41 to make the residual normal. Analysis of Augmented Randomized Complete Block Design was carried out using R package "augmentedRCBD" 42 . The Least Significance Difference (LSD) test was carried out using the R package "agricolae" 43 . Analysis of variance, estimation of variance components and heritability and MGIDI analysis was done through R package "metan" 27 . A phylogenetic tree was constructed from genotypic data of polymorphic SSR markers using NTSYSpc version 2.2 44 , on the basis of genetic distances.
Bioethical statement. We confirmed that all local, national or international guidelines and legislation were adhered for the use of plants in this study (https:// www. nature. com/ srep/ journ al-polic ies/ edito rialpolic ies# resea rch-invol ving-plants).

Results
Large-scale screening of soybean germplasm for charcoal rots resistance under sick-plot conditions. During (Table S2 and Fig. S1). The ANOVA, showed a significant genotypic effect (p < 0.001) for the PDI, during both the years (Table S3).The LSD test revealed that the genotypes significantly varied from each other (p < 0.05) for PDI, during both the years of experimentation. (Table S2).  Table 2).  (Table 2).
ANOVA, LRT, variance components and genetic parameters for the traits under study. The ANOVA in each year for 2020 and 2021 revealed that the genotypic effects for the three traits under study were significant at p < 0.001 (Table S4. Pooled analysis of variance (ANOVA) across years indicated that the genotypic effect, environmental effect and G × E interaction effect werehighly significant (p < 0.001) ( Table 3) for the three traits.The genotypic effect for AUDPC contributed 93.6% to the total variation, followed by G × E interaction effect (4.6%) and environmental effect (1.22%). The largest portion of the total variation for PDI was explained by the genotypic effect (92.72%) followed by G × E interaction effect (3.59%) and environmental effect (1.56%). Similarly, 86.22% of the total variation for grain yield was governed by genotypic effect followed by G × E interaction effect (5.29%) and environmental effect (0.07%). Likelihood Ratio Test (LRT) revealed highly significant genotype and G × E interaction effect (p < 0.001) for all the three traits under study (data not shown).
Different variance components and genetic parameters of the traits under study across both the years are presented in Table S5. During both the years, for all three traits, genotypic variance was higher than the environmental variance. Heritability estimates were high for all the traits under study. Genotypic coefficient of variation (CV g ) was high for all the three traits, in both the years. Residual coefficients of variation (CV r ) was medium for PDI and AUDPC and low for grain yield, in both the years.
Genotypic BLUP values for PDI, AUDPC and Grain yield. Genotypic BLUP values for the traits PDI, AUDPC and Grain yield across two years are shown in Table S5 (Table S6).
Identification of ideotypes using MGIDI index. Using MGIDI index at a 10% selection intensity, genotypic selection was carried out based on multiple traits simultaneously (Table 3). A lower value was desirable for AUDPC and PDI and a higher value was desirable for grain yield. Across both the years, percentage selection differential      (Fig. 3).
Structure analysis. The 59 SSR markers werealso used to study the population structure of the 48 genotypes. A sharp peak in DK at K = 2 suggested the presence of two major populations (Fig. 4a).  Table 4. Predicted genetic gains for the traits PDI (Percent Disease Incidence), AUDPC (Area Under Disease Progress Curve) and Grain yield across the years 2020 and 2021 using MGIDI index. Xo the original population mean, Xs the mean of selected genotypes, and SD and SDperc the selection differential and selection differential in percentage, respectively.  (Table 7). Population 1 included the majority of the genotypes bred at Jabalpur (06) followed by genotypes developed at Indore (03), germplasm lines (03), developed at Amravati (02) and Pantnagar (02). Population 2 consisted of the majority of genotypes developed at Jabalpur (13), followed by germplasm lines (05), genotypes  (Fig. 4b).

Discussion
Though charcoal rot is a major fungal disease in India, to date, no systematic study on identification of highyielding and charcoal rot resistant genotypes has been carried out. Based on grain yield and resistance reaction, the purpose of current study was identification of high-yielding charcoal rot resistant genotypes and molecular characterization of these genotypes using SSR markers under high-disease pressure. Above ground charcoal rot symptoms start to appear from R 4 stage (2 cm longpod at one of the four upper most nodes with a completely unrolled leaf). It was observed that the increase in colonization of soybean by M.phaseolina was low during the vegetative and early reproductive stages, and reached its peak during R 5 (beginning of pod development)to R 7 growth stages 6 .Therefore, in this study, AUDPC was recorded during reproductive stages and PDI was recorded at R 7 stage, the ideal growth stage to evaluate charcoal rot plant resistance 7 .
Lower levels of residual coefficient of variation (CV r ) indicate the quality of experimentation. In this study, CV r was lower for grain yield and intermediate for AUDPC and PDI indicating relative uniformity of the disease pressure across the experimental site.Higher genotypic variance and heritability estimates indicate higher response to selection for the traits under this study. The pooled ANOVA revealed that the genotypic variance contributed predominantly to the total variation. The significant G × E interaction effect was observed, indicating that genotypes did not respond the same to the disease across the environments. Improvement in any economic trait depends on understanding its mode of inheritance and heritability. No extensive studies have been carried out related to soybean charcoal rot resistance 12 . A polygenic mode of inheritance for resistance to M.phaseolina in soybean was reported in a few studies [18 and 12]. One QTL on chromosome 15 and two QTL on chromosome 16, governing charcoal rot resistance in soybean was mapped using F 2:3  www.nature.com/scientificreports/ derived lines from the cross of PI 567562A (R) × PI 567,437 (S) 12 . In other crops like Sorghum 45 and common bean 46 epistatic interactions were observed. Inadequate information on the genetic mechanisms underpinning resistance and significant effects of environment has hindered the progress in breeding for resistance 14 . Nevertheless, transgressive segregation of progeny derived from resistant parents can be useful to identify novel and durable resistant sources 47 . In the current study, based on the pooled data, JS 20-20 was identified as the best genotype for mapping of genes/QTLs governing charcoal rot resistance. JS 20-98 was identified as an appropriate genotype for use as a parent in breeding for higher yield and resistance under high-disease pressure.The use of MGIDI index in selecting ideal genotypeswith resistance based on the multiple traits evaluated in this study, was effective in selecting for yield 48 and quality traits 49 . In the current study, JS 21-05, JS 22-01, JS 20-98 and JS 20-20 were identified as the ideotypes based on the traits-PDI, AUDPC and grain yield. These genotypes were determined to be potentially high yielding sources with charcoal rot resistance. Hybridization among these ideotypes can result in selection of superior segregants having higher yield and CR resistance.
In addition, traits such as 100-seed weight, plant height, number of nodes, number of branches, biomass and harvest index should be considered in future studies to identify traits associated with yieldand disease indices under high-disease pressure similar to grain mold resistance in sorghum 50,51 , and fall armyworm resistance in maize 52 .
To assess variation, SSR markers have been widely used for the screening of soybean germplasm 53,54 . In our study, 59 markers distributed uniformly across 20 linkage groups were used for molecular characterization. The high percentage of polymorph ism and high mean PIC value detected in this study is consistent with the previous studies 31,55,56 . However, lower number of alleles per locus indicates a relatively narrow genetic base among the genotypes used in this study.
In the current study, Satt373 had a PIC value of 0.619 with 6 alleles and Sat_244 had 5 alleles with a high PIC value of 0.737. However, satt440 marker with PIC value > 0.6 with the highest number of alleles (4) denotes a strong correlation between PIC value and allele richness. Two unique alleles that can identify different resistant genotypes were identified in this study. Cluster analysis, indicated that majority of genotypes developed at Jabalpur were grouped under a single cluster, IIb2. This indicates the genetic relatedness and narrow genetic base of the genotypes bred at Jabalpur. Except for the AMS-MB-5-18,the remaining genotypes included in POP 1 in structure analysis were included in the cluster IVb, indicating the consistency between cluster analysis and structure analysis for determining genetic relatedness among genotypes evaluated in this study.
In India, apart from charcoal rot disease, Rhizoctonia aerial blight (RAB), YMV and anthracnose are the predominant diseases that can cause significant yield losses. Mega-varieties such as JS 95-60 and JS 93-05 are highly susceptible to all these diseases. Genotypes such as JS 21-71, JS 21-72, JS 21-05, JS 21-17, PS 1611, JS 20-98 and JS20-20 were identified to be resistant to charcoal rot in the current study, were also reported as RAB resistant [57][58][59] . Genotypes JS 20-98, JS 21-05, JS 21-17 and PS 1611 were reported to be YMV resistant 58,60 , while JS 20-98, PS 1611were reported to be anthracnose resistant 59,61 . These genotypes can be utilized as parents to develop multiple disease resistant varieties that can play a crucial role in enhancing soybean productivity in India.

Conclusion
In the current study, JS 20-20 was identified as the best genotype for resistance and JS 20-98 was superior in grain yield under high disease pressure. Genotypes JS 21-05, JS 22-01, JS 20-98 and JS 20-20 were identified as ideal ideotypes with respect to AUDPC, PDI and grain yield for charcoal rot resistance. These genotypes will be used as parents to develop high-yielding charcoal rot resistant varieties. Two unique alleles Satt588 (100 bp) and Sat_218 (200 bp) were specific in two resistant genotypes JS 21-71and DS 1318, respectively. In the molecular diversity study, JS20-20 formed a distinct cluster and therefore may be useful in resistance gene mapping and characterization studies. Clustering pattern, showed that the genotypes bred at Jabalpur were genetically more closely related compared to other genotypes.

Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.